For a quick overview on what are meta-analysis (MA), please watch this 3-minute video:
Meta-analyses (MAs) can be useful several purposes, including: (1) theory building and evaluation and (2) practical decisions during study design. This section starts with some basics on why MAs are more useful than single studies for those two purposes.
When thinking about development, we often look at published experiments testing whether infants have specific abilities, for example whether infants treat native vowels differently from non-native ones, and how those abilities change with age (for more details on this specific topic see https://langcog.github.io/metalab2/dataset/inphondb-native.html).
The results of a single experiment cannot answer those questions once and for all: Each experiment measures behavior of a set of infants in a very specific situation, which might not be generalizable to other situations. Moreover, there might be a measurement error in this one-time snapshot of reality. Finally, the literature likely contains some false positives and false negatives.
With a significance threshold alpha set to .05, every study we run has a 5% chance of telling us that infants can do something when this is not true - this is a false positive. This likelihood becomes bigger when researchers engage in seemingly innocent and possibly common practices that increase the chance of a false positive, such as running multiple analyses until one is significant. Some people even propose that most published literature consists of false positives!
With a beta set to .2, every study we run has a 20% chance of telling us that infants cannot do something when they actually can - this is a false negative. In fact, many studies are underpowered, so that a non-significant result is not due to a true lack of effect, but rather to lack of power to detect it.
None of these necessarily are due to bad intentions, wrongdoing, or even poor research practices. Reality is complex and thus any one study can only give us a single, noisy estimate.
MAs may be the cheapest way to assess generalizability and test whether a certain factor matters. Instead of running 10 experiments, 1 on each vowel contrast, we collect 10 studies in the literature into a single analysis! If effects for e.g. native versus non-native vowels differ significantly in the literature as a whole, then we can be more confident that results will generalize to unobserved vowel contrasts.
Collecting many study results from different researchers is a way to try and make up for the possibility that biases influenced the outcome. We can even use MAs to check for biases, such as asking whether a suspicious number of p-values is just below the significance threshold or whether results are systematically skewed in one direction. Why biases matter is wonderfully illustrated here: http://www.alltrials.net/news/the-economist-publication-bias/. Checking for biased results is a whole literature on its own, but as a start tools such as p-curving apps are easily available for every researcher. http://www.p-curve.com/ or http://shinyapps.org/apps/p-checker/ are two well-documented examples.
MAs help in 3 ways. By pooling data together, we may be able to bring out a small effect that was too difficult to detect.
Additionally, we often do not know about these non-significant findings because it is quite difficult to publish them. Community-augmented MAs like those in MetaLab provide a home for non-sexy results, and allow researchers to benefit from the experience of others.
Finally, MAs help us in experiment design so we can avoid false negatives due to low power. When the size of an effect is known and with a fixed significance threshold, calculating power is straightforward. Here is a simulation of how all ingredients fit together: <rpsychologist.com/d3/NHST/>.
The MAs in MetaLab can also help with study design, becodes often many design variables have been coded. Examples include the words or sounds used, how long trials were, etc. Instead of doing a tiresome literature review, you can find out what is the most common procedure or which is associated with the biggest effect.
In MAs (meta-analyses) we express the outcome of a single experiment in a way that captures how big an effect is and how much it varies. There are 3 groups of effect sizes:
1. effect sizes based on means, which includes Cohen’s d on which we focus from here on;
2. effect sizes based on binary data; and
3. effect sizes based on correlations.
Since most developmental studies in the lab use mean responses of two groups or of the same infant in two (or more) conditions, Cohen’s d is the appropriate effect size measure. In this chapter 3 and the following ones, a gentle introduction to effect sizes is provided. Cohen’s d is based on standardized mean differences. To get a feel for Cohen’s d we highly recommend to play with the visualization of RPsychologist. A list of recommended readings is also provided at the end of this document.
In a typical infant study, babies might hear two types of trials and the responses to each are compared. In most papers, it is sufficient that the difference between the trial types reaches statistical significance, but in a meta-analyses we care about the size of this single observed effect and its variance. This allows us to pool over several studies, weigh each datapoint, and arrive at an estimate of the underlying, true effect. This then allows us to calculate power and check how effect sizes might be systematically affected by variables such as infant age in “moderator analyses”.
Recommended further readings for an introduction to effect sizes:
Choose the appropriate level of detail for your MA topic. The topic of your meta-analysis should be broader than the one of a single experiment (e.g. “How do babies segment words of different stress patterns?”), but narrower than a whole research field (e.g. “How do babies learn language?”). The goal is to be able to gather comparable papers, measuring consistant dependant variables, to allow you to compute a common statistical metric (i.e. effect size) from them.
Define your population of interest precisely. Homogeneous can mean many things; age, language, typical versus atypical. You may run a meta-analysis where you accept many different levels for some of the variables and see how it affects results, defining them as MA moderators, for example seeing if effects are consistent across ages. There should still be some unifying element in your studies though so you have one broad result of your meta-analysis.
Consider the number of available studies on your topic. Your MA topic also depends on how many studies have been done on it. If you want to run a simple comparative MA, as few as two studies could be okay. But if you want to run an analysis with a lot of moderators, 5 studies probably isn’t enough to warrant a meta-analysis.
It is important that you build traceability of your work from the start, particularly since in larger MAs other people may finish up the work or you want to check later on why you decided to exclude a given paper. So to make sure that all of your decisions are recorded and clear, make a copy of this decision spreadsheet. Don’t forget to rename it, to give us a “viewing” link, and clean it up as follows.
Step 1: Click on “File” and select the “Make a copy…” option
Step 2: In the window that appears, change the name to something like “MA_TOPIC”
Step 3: Click on the blue button “Share” on the top right.
Step 4: In the menu, click on “Get shareable link” on the top right
Step 5: Copy the link and send it to us.
Step 6: Clean up
The model spreadsheet contains some fake entries and notes. Our recommendation is, so as not to get confused, to remove the instructions found on the top lines of each sheet and the fake information that is already entered - except for a couple of exceptions: the pink columns (A, B and W) in the Relevant_studies_search sheet contain formulas that may be useful to you. So you might want to delete the contents of the other columns and keep those two in order to reuse the formulas.
************
Additionally, make a copy of this flowchart, rename and share as you did for your spreadsheet above. This figure gives you an overview of the process, and you will be filling in the boxes with the right numbers as you go along so that people who continue this MA and/or those interested in assessing this work can make sure that you followed the procedure.
Probably not. In this step, you will go through the initial list you put together in step 2, and make decisions to include/exclude papers, mostly based on the abstract. In addition to creating your sample for data entry (step 4) you will start honing your inclusion criteria. Typically, these will include:
a homogeneous scientific question: > Make sure you have clearly > defined the purview of e.g. cross-situational learning (e.g., this > name itself is vague to those outside the domain, so define it in > a more specific way: “exposure to sets of images paired with > wordforms with the goal of studying word-form image association, > but crucially multiple images are shown at once (unlike e.g. the > switch procedure)”)
a homogeneous infant population: > Typically-developing children, > between the ages of XX and YY (the precise ages may stem from your > seminal paper; perhaps to start with, you could set the maximum to > 36 months, the minimum to 0 months); consider whether you also > need to restrict the sample based on infants’ native language on > theoretical reasons
The last one is perhaps the trickiest. Staying close to your seminal paper will allow you to reduce the amount of variation in your sample due to methodological “details”, and to make it easier for yourself to enter data, because all the results will be structured in similar ways. But it’s important to know that this is a potential source of bias. For instance, you could decide that you will only input data using a specific kind of artificial language because you know that papers not using this language have smaller effects. This will end up being a self-confirmation exercise – unless there are a priori strong theoretical reasons to exclude other kinds of language or to assume that the learning mechanisms attributed to the infant cannot be generalized to these other languages.
Every time you make a decision regarding these and other key criteria, remember to note it in your decision spreadsheet, in the last sheet called “Notes_inclusion”. For example, mine looks like this:
| Question | Decision | Date |
|---|---|---|
| a homogeneous scientific question | learning of speech sound categories, where the categories are represented by a multimodal versus unimodal distribution of acoustic correlates | 10/19/2015 |
| a homogeneous infant population | typically-developing children, between the ages of 0 and 36 months | 10/19/2015 |
| a homogeneous procedure | passive exposure in the lab, testing via any behavioral or non-behavioral method | 10/19/2015 |
The goal of this step is to put together a list of publications that you will look at and consider for inclusion. In a typical MA, you make the most comprehensive list possible in order to answer a specific research question and/or to cover a given phenomenon. This typically means going through 1,000 abstracts, and reading in full 100 papers. You can start with the seminal paper for your effect of interest, and then look for the studies citing your seminal one. Use pubmed’s search to find your pivot study’s entry, for instance by copy-pasting the full paper title in the builder:
When you press “search”, usually you’ll find the entry for your seminal paper (or if the title was not unique, you might need to click on one of the entries found until you do come across the entry for your seminal paper). Notice on the right a section entitled “Cited by …” Scroll down to click on the link at the bottom of this section stating “See all..”
You will now see all studies citing your seminal one. Constrain it further by clicking on “Show additional filters” on the left, and checking the box for “Infant: birth to 23 months”:
You now want to save all these papers in your reference management software. If you use Zotero: Click on the drawing of a folder in your status/search bar. When you do so, a window will pop up with all the results for that pubmed page:
Click on “select all” and “OK”. Repeat for the other search pages. This will store the citation information, including abstract, in zotero.
You can also interrogate Pubmed with a script, such as the one we have prepared.
What if the title and abstract doesn’t allow me to decide?
Then play it safe and include the paper to check based on the full text.
What if the title and abstract doesn’t allow me to decide, but in fact I know the paper and I know it needs to be excluded?
Then you probably have already seen the full text of the paper, so say “yes” for the screening decision, and then “no” for the full-text decision.
Ideally, you would enter everything: published or unpublished, proceedings or journal, etc. However, sometimes you may want to start a “seed” meta-analysis that just gives a rough idea of an area.
In this case, how large should your sample be? Mika and Molly have done some simulations to help you decide. By and large, it looks like the more, the better – clearly estimates get more precise (confidence intervals narrow) as more papers are entered. Based on this information, we are proposing a minimum of 10 included experiments as a pragmatic first step, knowing that your estimate is not very precise.
We are hoping that eventually all of your MAs (meta-analyses) will be included in MetaLab, so we ask you to use the MA template (create a copy), and follow the field specifications to ensure compatibility. (Note that right now, winter 2018, these specifications are biased towards language acquisition research. If you work on another topic, we would be thrilled if you helped us expand and adapt MetaLab).
Ideally, you would code all potentially relevant moderator variables (e.g., experimental manipulations) in addition to the core characteristics (columns in red; e.g., number of participants). However, in the interest of time, you can get started with the core characteristics only. Remember once more to give us viewing rights.
One of my papers has a single experiment but involves both Spanish and English speakers who are tested on a native and a nonnative speech sound contrast. Should that count as 4 experiments (2 languages x 2 contrasts)?
How many rows you make depends on how the results are reported. In this case, the authors report the outcome separately for all four groups. Therefore, please enter the four groups separately; each into their own row. You can copy over descriptions of the experiment.
In Experiment 1, there are two age groups. Do I have to report the age for both groups or do I average both groups into one? If I have to report both groups, how do I report this in the input form?
How many rows you make depends on how the results are reported. In this case, the authors report an average outcome over both age groups, since they did not find a significant difference between the two groups. Therefore, please enter only one row and calculate the average age. If the results were reported separately per age group, make a
In a typical full MA, you go through the whole list and only then start entering. The procedure is as follows. Go back to your spreadsheet, and for each study that has been decided as a “yes” during screening, try to retrieve the full text for the paper as you normally would (e.g., search through scholar.google.com; regular google; your institution’s library, etc.) If you cannot retrieve it, update your spreadsheet sheet Relevant_studies_search to mark this paper as “no” in column F entitled “Fulltext_retrieved”. If you want, you can contact the authors to try to get the full text from them, in which case you can note this on column G.
If you do find the full text, go through the paper to find the first experiment reported. You will enter all experiments and conditions one at a time, and fill in their information in the MA spreadsheet you created in step 4.
IMPORTANT: You should work backwards from the results section: look at what dependent measures are reported fully enough that you will be able to extract an effect size from them.
The following information allows one to calculate an effect size (we are sticking to experimental designs, since most of our MAs are experimental):
between-participant studies: > Means and SDs (not SEs!) of the > dependent variable for each infant group** are all that is > required for the calculation of Cohen’s d. Sometimes, means and > SDs are not available as numbers. If there are clear figures, you > can try to estimate means and SDs using** this online > app*. If you decide to > estimate values from figures, add a column to keep track of this. > Finally, t or F values for the main effect in combination with > sample sizes can be used to calculate Cohen’s d. Note them > when available.
within-participant studies: > Effect sizes for this type of > study are calculated the same way as in between-participant > studies, but in order to calculate the weight of these studies > the correlation between the first and second measurements is > required (to account for the amount of > within-participant variation). Since this measure is usually not > reported, we provide below median and range for correlations found > in existing MAs.
Infant word segmentation from native speech: 0.641 (range: 0.140 to 0.921)
Infant vowel discrimination (native and nonnative): 0.496 (range: -0.413 to 0.855)
When entering papers, please remember a key thing: all analyses are done by machines, and machines cannot read text! So if a column is “numeric”, please do not enter things that aren’t numbers (such as text, spaces, ~, etc). This is particularly important for the dependent measures!
At this stage, you might find that a given paper does not contain the right information for being included. In this case, you can and should exclude it. If you have already started entering it, you can leave the information you entered and put in “comments” that the entry is incomplete (although if you followed our advice above, you won’t have wasted time entering it!). Remember to update your spreadsheet with each paper you read and made a decision on.
The article I enter has 3 experiments, and the first is with adult participants. Do I need to enter this experiment?
No, please only enter the infant/child experiments
The sound stimuli differ approx. 6 ms in length, but the experiment is not about length differences. Do I have to report this difference although it is very small?
In case there’s a column for stimulus length, please report it. You are right that this experiment is not about length differences, but having the information cannot hurt, and eventual analyses will reflect that the difference is very small.
The article reports a table with the lengths of each individual stimulus. Should I calculate and report the average value?
Yes, please report the averaged value in the appropriate column.
I am entering an article with the HAS method. The authors report results for both the 2 and 4 minutes after the test phase has started. Your example only reports the results after 2 minutes, but would you still want me to report both?
It is often the case that articles report more than one type of result. Please just report the ones that we also provide in the example file!
We use R to calculate effect sizes. Visit https://github.com/langcog/metalab2 for our code.
We recommend the following for an introduction to effect sizes:
Textbooks are great to get a basic overview of how to calculate effect sizes. We consulted: Lipsey, M. W. & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage.
A great primer and a spreadsheet document to calculate effect sizes by hand can be found via: D. Lakens. (2013). Calculating and Reporting Effect Sizes to Facilitate Cumulative Science: A Practical Primer for t-tests and ANOVAs . Frontiers in Psychology 4:863. Materials on OSF
Since textbooks do not cover every possible question that different meta-analysts may encounter, we turned to articles for more specific questions. We found this article useful for considering the comparability of effect sizes from within- and between-participant designs: Morris, S. B., & DeShon, R. P. (2002). Combining Effect Size Estimates in Meta-Analysis With Repeated Measures and Independent-Groups Designs. Psychological Methods, 7(1), 1805-125. doi: 10.1037//1082-989X.7.1.105
Two groups of infants are tested and I treat them as two different entries, but the number of included and excluded infants are only reported as a whole over both groups. What do I do?
As the best approximation we can get, please divide the reported number through the number of groups (in your case 2).
The age of infants is reported in weeks, therefore I multiplied it with 7 to convert it into days. I read in your instructions that you have to multiply months with 30.42 to get a proxy for days. So my question is whether I have to multiply with a different number than 7 to get a proxy for days?
No, that’s fine the way you did it!
In some cases you will still need to contact the authors of the study. People probably don’t know you, so think about what in the object would make you open an email from a stranger. Something like “including your paper in a MA” should be motivational. People are busy: they don’t have time to read lengthy email, especially from someone they don’t know, so be as concise as possible. You could still give them more details later if they ask for it. Don’t be shy, authors are likely to be happy to hear that someone is interested in their work and is going to cite them!
If you have already done a meta-analysis, you can easily add it to MetaLab. This tutorial explains how. But first, we go over why this is good for you and the community.
Contributing to MetaLab can have several advantages both for you and the community:
* Get more visibility for your MA. When publishing a paper, you wish that it will be read by as many people as possible. Placing it in a centralized repository such as MetaLab can help you to reach this broad audience and gives more visibility to your meta-analysis.
* Increase the impact of your MA. As an author, you probably want your results to be fully understood by readers, and you want your readers to use your data as efficiently as possible. The interactive interface of the MetaLab website allows readers to better navigate in your meta-analysis results than the paper version, and to play with the results to better use them when planning experiments. MetaLab is an opportunity for your meta-analysis to make a stronger impact.
* Contribute to drawing the broader developmental picture. You made a meta-analysis to help the community draw a clearer picture about an effect of interest and contribute to theory assessment. MetaLab is a central platform that includes over 1040 effect sizes. Incorporating your meta-analysis in this larger dataset helps the community to have a better idea of cognitive development and language acquisition.
You remain the owner of your meta-analysis data: Users must cite your data by your preferred citation. If your data are previously unpublished then this doesn’t count as publication. Learn more by reading our full citation policy.
You can retain control for as long as you want to. In fact, two options exists for the curation and review of your data. You can choose to be the curator. This means you agree to be the person responsible for identifying new relevant papers and signaling them to the MetaLab data manager, who will add them to the database of the relevant MA. You would be expected to check data entry once in a while. Curators are part of the MetaLab board and get informed of discussions regarding e.g. site revamping. Alternatively, you can choose to step down completely, and it will be MetaLab’s job to assign a new curator for your dataset. In this case, we can still keep your photo on the wall of fame.
We prepared a spreadsheet that you must use to code your data. The key property of this spreadsheet is that it has one row for each effect size. Therefore you should fill as many rows as the number of effect size you report in your meta-analysis.
* The first tab, called “Data”, should contain your data.
* The second tab, called “CodeBook”, contains all the explanations about the codes to be used when you fill the “Data” tab.
* The third tab, called “Methods”, contains all the possible options for the “method” column and their respective description.
Follow these steps to convert your data:
If you work from your paper, you may have an appendix or a table for each of your moderators. In this case, take them one by one and follow these steps:
Write the two references in each of the first three columns (study_ID, long_cite, short_cite), separated by commas. The “short cite columns” should be in the text citation format, i.e. Smith (2002, 2008), if the two papers are from the same authors. Fill the other columns as usual.
Fill the missing columns with “NA”. If it happens that you don’t have one of the mandatory columns, please let us know.
Most journal articles are peer-reviewed; some conference proceedings (e.g., Cognitive Science) are peer-reviewed. Typically, book chapters, posters, and conference abstract are not considered peer-reviewed because no reviewer has seen the full details of the methods.